Indexing and Searching Mathematics in Digital Libraries
Identifieur interne : 000388 ( Main/Exploration ); précédent : 000387; suivant : 000389Indexing and Searching Mathematics in Digital Libraries
Auteurs : Petr Sojka [République tchèque] ; Martin Líška [République tchèque]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2011.
Abstract
Abstract: This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware search engine based on the state-of-the-art system, Apache Lucene. Scalability issues were checked based on 324,000 real scientific documents from arXiv archive with 112 million mathematical formulae. More than two billions MathML subformulae were indexed using our Solr-compatible Lucene extension.
Url:
DOI: 10.1007/978-3-642-22673-1_16
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000687
- to stream Istex, to step Curation: 000679
- to stream Istex, to step Checkpoint: 000044
- to stream Main, to step Merge: 000393
- to stream Main, to step Curation: 000388
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Indexing and Searching Mathematics in Digital Libraries</title>
<author><name sortKey="Sojka, Petr" sort="Sojka, Petr" uniqKey="Sojka P" first="Petr" last="Sojka">Petr Sojka</name>
</author>
<author><name sortKey="Liska, Martin" sort="Liska, Martin" uniqKey="Liska M" first="Martin" last="Líška">Martin Líška</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:0FF53C9AF09F36F0742CA95BA5ECF84248EB2FA8</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1007/978-3-642-22673-1_16</idno>
<idno type="url">https://api.istex.fr/document/0FF53C9AF09F36F0742CA95BA5ECF84248EB2FA8/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000687</idno>
<idno type="wicri:Area/Istex/Curation">000679</idno>
<idno type="wicri:Area/Istex/Checkpoint">000044</idno>
<idno type="wicri:doubleKey">0302-9743:2011:Sojka P:indexing:and:searching</idno>
<idno type="wicri:Area/Main/Merge">000393</idno>
<idno type="wicri:Area/Main/Curation">000388</idno>
<idno type="wicri:Area/Main/Exploration">000388</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Indexing and Searching Mathematics in Digital Libraries</title>
<author><name sortKey="Sojka, Petr" sort="Sojka, Petr" uniqKey="Sojka P" first="Petr" last="Sojka">Petr Sojka</name>
<affiliation wicri:level="3"><country xml:lang="fr">République tchèque</country>
<wicri:regionArea>Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno</wicri:regionArea>
<placeName><settlement type="city">Brno</settlement>
<region>Moravie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">République tchèque</country>
</affiliation>
</author>
<author><name sortKey="Liska, Martin" sort="Liska, Martin" uniqKey="Liska M" first="Martin" last="Líška">Martin Líška</name>
<affiliation wicri:level="3"><country xml:lang="fr">République tchèque</country>
<wicri:regionArea>Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno</wicri:regionArea>
<placeName><settlement type="city">Brno</settlement>
<region>Moravie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">République tchèque</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2011</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">0FF53C9AF09F36F0742CA95BA5ECF84248EB2FA8</idno>
<idno type="DOI">10.1007/978-3-642-22673-1_16</idno>
<idno type="ChapterID">16</idno>
<idno type="ChapterID">Chap16</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware search engine based on the state-of-the-art system, Apache Lucene. Scalability issues were checked based on 324,000 real scientific documents from arXiv archive with 112 million mathematical formulae. More than two billions MathML subformulae were indexed using our Solr-compatible Lucene extension.</div>
</front>
</TEI>
<affiliations><list><country><li>République tchèque</li>
</country>
<region><li>Moravie</li>
</region>
<settlement><li>Brno</li>
</settlement>
</list>
<tree><country name="République tchèque"><region name="Moravie"><name sortKey="Sojka, Petr" sort="Sojka, Petr" uniqKey="Sojka P" first="Petr" last="Sojka">Petr Sojka</name>
</region>
<name sortKey="Liska, Martin" sort="Liska, Martin" uniqKey="Liska M" first="Martin" last="Líška">Martin Líška</name>
<name sortKey="Liska, Martin" sort="Liska, Martin" uniqKey="Liska M" first="Martin" last="Líška">Martin Líška</name>
<name sortKey="Sojka, Petr" sort="Sojka, Petr" uniqKey="Sojka P" first="Petr" last="Sojka">Petr Sojka</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000388 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000388 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:0FF53C9AF09F36F0742CA95BA5ECF84248EB2FA8 |texte= Indexing and Searching Mathematics in Digital Libraries }}
This area was generated with Dilib version V0.6.32. |